NumPy Basics. Working with multidimensional arrays

Basic tutorials:


In [2]:
import numpy as np
from StringIO import StringIO

Loading data into numpy


In [79]:
!head wine_names.csv


Class, a, b, c, d, e, f, g, h, i, j, k, l, m
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
1,14.2,1.76,2.45,15.2,112,3.27,3.39,.34,1.97,6.75,1.05,2.85,1450
1,14.39,1.87,2.45,14.6,96,2.5,2.52,.3,1.98,5.25,1.02,3.58,1290
1,14.06,2.15,2.61,17.6,121,2.6,2.51,.31,1.25,5.05,1.06,3.58,1295
1,14.83,1.64,2.17,14,97,2.8,2.98,.29,1.98,5.2,1.08,2.85,1045

In [71]:
data = np.genfromtxt("wine_names.csv", dtype=None, delimiter=',', skip_header=1)

In [3]:
data = np.genfromtxt("wine_names.csv", dtype=float, delimiter=',', skip_header=1)

In [4]:
data


Out[4]:
array([[  1.00000000e+00,   1.42300000e+01,   1.71000000e+00, ...,
          1.04000000e+00,   3.92000000e+00,   1.06500000e+03],
       [  1.00000000e+00,   1.32000000e+01,   1.78000000e+00, ...,
          1.05000000e+00,   3.40000000e+00,   1.05000000e+03],
       [  1.00000000e+00,   1.31600000e+01,   2.36000000e+00, ...,
          1.03000000e+00,   3.17000000e+00,   1.18500000e+03],
       ..., 
       [  3.00000000e+00,   1.32700000e+01,   4.28000000e+00, ...,
          5.90000000e-01,   1.56000000e+00,   8.35000000e+02],
       [  3.00000000e+00,   1.31700000e+01,   2.59000000e+00, ...,
          6.00000000e-01,   1.62000000e+00,   8.40000000e+02],
       [  3.00000000e+00,   1.41300000e+01,   4.10000000e+00, ...,
          6.10000000e-01,   1.60000000e+00,   5.60000000e+02]])

Numpy Basics

ndarray.ndim > the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank. (Commentary: This is not = dimensions (columns)??

ndarray.shape > the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows (Reihen) and m columns (Spalten), shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size > the total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.dtype > an object describing the type of the elements in the array. One can create or specify dtype's using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.


In [73]:
data.shape


Out[73]:
(178, 14)

177 rows and 14 columns, 177 datapoints with 14 dimensions


In [74]:
data.ndim


Out[74]:
2

In [128]:
np.set_printoptions(threshold='nan')

Print everything ↑


In [75]:
np.set_printoptions(edgeitems=3,infstr='inf',
linewidth=75, nanstr='nan', precision=8,
suppress=False, threshold=1000, formatter=None)

Default settings ↑

Read-in a row or data point with index n


In [90]:
data[10]


Out[90]:
array([  1.00000000e+00,   1.41000000e+01,   2.16000000e+00,
         2.30000000e+00,   1.80000000e+01,   1.05000000e+02,
         2.95000000e+00,   3.32000000e+00,   2.20000000e-01,
         2.38000000e+00,   5.75000000e+00,   1.25000000e+00,
         3.17000000e+00,   1.51000000e+03])

Read-in a column (dimension)


In [92]:
data[:,1]


Out[92]:
array([ 14.23,  13.2 ,  13.16,  14.37,  13.24,  14.2 ,  14.39,  14.06,
        14.83,  13.86,  14.1 ,  14.12,  13.75,  14.75,  14.38,  13.63,
        14.3 ,  13.83,  14.19,  13.64,  14.06,  12.93,  13.71,  12.85,
        13.5 ,  13.05,  13.39,  13.3 ,  13.87,  14.02,  13.73,  13.58,
        13.68,  13.76,  13.51,  13.48,  13.28,  13.05,  13.07,  14.22,
        13.56,  13.41,  13.88,  13.24,  13.05,  14.21,  14.38,  13.9 ,
        14.1 ,  13.94,  13.05,  13.83,  13.82,  13.77,  13.74,  13.56,
        14.22,  13.29,  13.72,  12.37,  12.33,  12.64,  13.67,  12.37,
        12.17,  12.37,  13.11,  12.37,  13.34,  12.21,  12.29,  13.86,
        13.49,  12.99,  11.96,  11.66,  13.03,  11.84,  12.33,  12.7 ,
        12.  ,  12.72,  12.08,  13.05,  11.84,  12.67,  12.16,  11.65,
        11.64,  12.08,  12.08,  12.  ,  12.69,  12.29,  11.62,  12.47,
        11.81,  12.29,  12.37,  12.29,  12.08,  12.6 ,  12.34,  11.82,
        12.51,  12.42,  12.25,  12.72,  12.22,  11.61,  11.46,  12.52,
        11.76,  11.41,  12.08,  11.03,  11.82,  12.42,  12.77,  12.  ,
        11.45,  11.56,  12.42,  13.05,  11.87,  12.07,  12.43,  11.79,
        12.37,  12.04,  12.86,  12.88,  12.81,  12.7 ,  12.51,  12.6 ,
        12.25,  12.53,  13.49,  12.84,  12.93,  13.36,  13.52,  13.62,
        12.25,  13.16,  13.88,  12.87,  13.32,  13.08,  13.5 ,  12.79,
        13.11,  13.23,  12.58,  13.17,  13.84,  12.45,  14.34,  13.48,
        12.36,  13.69,  12.85,  12.96,  13.78,  13.73,  13.45,  12.82,
        13.58,  13.4 ,  12.2 ,  12.77,  14.16,  13.71,  13.4 ,  13.27,
        13.17,  14.13])

Read-in a value inside an array

first value (index [0, n]) of second dimension [n, 1] = second column


In [103]:
data[0,1]


Out[103]:
14.23

second value (index [1, n]) of second dimension [n, 1] = second column


In [104]:
data[1,1]


Out[104]:
13.199999999999999

With this "Printing" and accessing of values of one data point is manageable. The next step is to min max scale all values of one dimension to some parameters for the Synthdef (OSC-messages) such as frequency or something different.

Before that some basics about the communication between ipython and SC3 via OSC.

How to send OSC messages to SC3 with ipython?

Download pyOSC here: https://trac.v2.nl/wiki/pyOSC And do sudo ipython setup.py install inside the pyOSC folder.

On OSC more here: http://opensoundcontrol.org/introduction-osc

Source: http://www.caseyanderson.com/teaching/ipython-to-supercollider-via-osc/

Load the code below in SC3 (Supercollider): http://supercollider.sourceforge.net/

( SynthDef("grain", { |out, amp=0.1, freq=440, sustain=0.01, pan| var snd = FSinOsc.ar(freq); var amp2 = amp * AmpComp.ir(freq.max(50)) * 0.5; var env = EnvGen.ar(Env.sine(sustain, amp2), doneAction: 2); OffsetOut.ar(out, Pan2.ar(snd * env, pan)); }, \ir ! 5).add; )

In [26]:
import OSC
import time, random
client = OSC.OSCClient()
client.connect( ( '127.0.0.1', 57110 ) )

OSC message to send: s.sendMsg("s_new", \grain, -1, 0, 1, \freq, 200, \sustain, 0.1, \pan, -1.0);


In [66]:
msg = OSC.OSCMessage()
msg.setAddress("s_new")
msg.append("grain")
msg.append(-1)
msg.append(0)
msg.append(1)
msg.append("amp")
msg.append(1)
msg.append("freq")
msg.append(4000)
msg.append("sustain")
msg.append(0.1)
msg.append("pan")
msg.append(0)
client.send(msg)

In [186]:
import time, sys
for i in range(100):
    
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(440+(i*10))
    msg.append("sustain")
    msg.append(0.15)
    msg.append("pan")
    msg.append(1)
    client.send(msg)
    
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(1440+(i*10))
    msg.append("sustain")
    msg.append(0.15)
    msg.append("pan")
    msg.append(-1)
    client.send(msg)
    
    time.sleep(0.04)

Create/define a function


In [188]:
def oscgrain( frequency ):
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(frequency)     #read in data points
    msg.append("sustain")
    msg.append(0.015) #0.01-0.04
    msg.append("pan")
    msg.append(-1)
    client.send(msg)

In [190]:
oscgrain(1000)

Loading the array


In [4]:
data[0,1]


Out[4]:
14.23

Reading-out values from the array with a timed loop

reading out one data point


In [194]:
data[1,:]


Out[194]:
array([  1.00000000e+00,   1.32000000e+01,   1.78000000e+00,
         2.14000000e+00,   1.12000000e+01,   1.00000000e+02,
         2.65000000e+00,   2.76000000e+00,   2.60000000e-01,
         1.28000000e+00,   4.38000000e+00,   1.05000000e+00,
         3.40000000e+00,   1.05000000e+03])

In [130]:
import time, sys
for i in range (14):
    print data[1,i]
    time.sleep(0.4)


1.0
13.2
1.78
2.14
11.2
100.0
2.65
2.76
0.26
1.28
4.38
1.05
3.4
1050.0

Min max scaling the data

reading-out one dimension


In [153]:
data[:,1]


Out[153]:
array([ 14.23,  13.2 ,  13.16,  14.37,  13.24,  14.2 ,  14.39,  14.06,
        14.83,  13.86,  14.1 ,  14.12,  13.75,  14.75,  14.38,  13.63,
        14.3 ,  13.83,  14.19,  13.64,  14.06,  12.93,  13.71,  12.85,
        13.5 ,  13.05,  13.39,  13.3 ,  13.87,  14.02,  13.73,  13.58,
        13.68,  13.76,  13.51,  13.48,  13.28,  13.05,  13.07,  14.22,
        13.56,  13.41,  13.88,  13.24,  13.05,  14.21,  14.38,  13.9 ,
        14.1 ,  13.94,  13.05,  13.83,  13.82,  13.77,  13.74,  13.56,
        14.22,  13.29,  13.72,  12.37,  12.33,  12.64,  13.67,  12.37,
        12.17,  12.37,  13.11,  12.37,  13.34,  12.21,  12.29,  13.86,
        13.49,  12.99,  11.96,  11.66,  13.03,  11.84,  12.33,  12.7 ,
        12.  ,  12.72,  12.08,  13.05,  11.84,  12.67,  12.16,  11.65,
        11.64,  12.08,  12.08,  12.  ,  12.69,  12.29,  11.62,  12.47,
        11.81,  12.29,  12.37,  12.29,  12.08,  12.6 ,  12.34,  11.82,
        12.51,  12.42,  12.25,  12.72,  12.22,  11.61,  11.46,  12.52,
        11.76,  11.41,  12.08,  11.03,  11.82,  12.42,  12.77,  12.  ,
        11.45,  11.56,  12.42,  13.05,  11.87,  12.07,  12.43,  11.79,
        12.37,  12.04,  12.86,  12.88,  12.81,  12.7 ,  12.51,  12.6 ,
        12.25,  12.53,  13.49,  12.84,  12.93,  13.36,  13.52,  13.62,
        12.25,  13.16,  13.88,  12.87,  13.32,  13.08,  13.5 ,  12.79,
        13.11,  13.23,  12.58,  13.17,  13.84,  12.45,  14.34,  13.48,
        12.36,  13.69,  12.85,  12.96,  13.78,  13.73,  13.45,  12.82,
        13.58,  13.4 ,  12.2 ,  12.77,  14.16,  13.71,  13.4 ,  13.27,
        13.17,  14.13])

Maxiumum and minimum within one dimension (column)


In [154]:
np.amax((data[:,1]))


Out[154]:
14.83

In [156]:
np.amin((data[:,1]))


Out[156]:
11.029999999999999

In [178]:
np.amax((data[:,1]))-np.amin((data[:,1]))


Out[178]:
3.8000000000000007

reading-out all values of one dimension scaled between 0 and 1


In [14]:
dimension = 12
datanew = (data[:,dimension] - np.amin((data[:,dimension]))) / (np.amax((data[:,dimension]))-np.amin((data[:,dimension])))

In [15]:
datanew


Out[15]:
array([ 0.97069597,  0.78021978,  0.6959707 ,  0.7985348 ,  0.60805861,
        0.57875458,  0.84615385,  0.84615385,  0.57875458,  0.83516484,
        0.6959707 ,  0.56776557,  0.5970696 ,  0.53479853,  0.63369963,
        0.58974359,  0.50549451,  0.47619048,  0.56776557,  0.76556777,
        0.89377289,  0.82417582,  1.        ,  0.86446886,  0.93406593,
        0.70695971,  0.71428571,  0.54945055,  0.78021978,  0.84981685,
        0.52747253,  0.58974359,  0.58608059,  0.63369963,  0.58608059,
        0.80586081,  0.55311355,  0.45421245,  0.52014652,  0.82783883,
        0.77289377,  0.63369963,  0.83882784,  0.63369963,  0.76190476,
        0.75457875,  0.79487179,  0.75457875,  0.54212454,  0.67032967,
        0.6007326 ,  0.76923077,  0.72893773,  0.60805861,  0.70695971,
        0.64468864,  0.74725275,  0.57509158,  0.58608059,  0.2014652 ,
        0.14652015,  0.11721612,  0.43589744,  0.58608059,  0.35164835,
        0.37728938,  0.6996337 ,  0.80952381,  0.24175824,  0.65934066,
        0.2014652 ,  0.69230769,  0.55311355,  0.81684982,  0.68131868,
        0.31868132,  0.44322344,  0.45787546,  0.38095238,  0.68131868,
        0.67765568,  0.68498168,  0.53113553,  0.27106227,  0.66300366,
        0.69230769,  0.36263736,  0.71062271,  0.54212454,  0.71062271,
        0.36630037,  0.50549451,  0.28937729,  0.74358974,  0.61904762,
        0.4981685 ,  0.36263736,  0.53846154,  0.54945055,  0.57142857,
        0.61904762,  0.54945055,  0.77289377,  0.42857143,  0.84249084,
        0.74358974,  0.6959707 ,  0.42124542,  0.64102564,  0.72893773,
        0.56410256,  0.55311355,  0.45054945,  0.38095238,  0.7032967 ,
        0.58608059,  0.75457875,  0.61904762,  0.31135531,  0.65201465,
        0.77655678,  0.88644689,  0.67765568,  0.67032967,  0.86813187,
        0.73626374,  0.57509158,  0.42857143,  0.55311355,  0.47619048,
        0.00732601,  0.05494505,  0.03296703,  0.00732601,  0.08791209,
        0.11355311,  0.        ,  0.15384615,  0.2014652 ,  0.32234432,
        0.38095238,  0.43956044,  0.28937729,  0.28571429,  0.26739927,
        0.15018315,  0.02197802,  0.21611722,  0.12820513,  0.02197802,
        0.01098901,  0.07326007,  0.02197802,  0.08791209,  0.1025641 ,
        0.07692308,  0.13553114,  0.16849817,  0.25274725,  0.18681319,
        0.11355311,  0.2014652 ,  0.30769231,  0.17582418,  0.15018315,
        0.17582418,  0.10622711,  0.17582418,  0.19413919,  0.23809524,
        0.20512821,  0.13186813,  0.16117216,  0.17216117,  0.10622711,
        0.10622711,  0.12820513,  0.12087912])

How to put together all dimensions?


In [6]:
data0 = (data[:,0] - np.amin((data[:,0]))) / (np.amax((data[:,0]))-np.amin((data[:,0])))
data1 = (data[:,1] - np.amin((data[:,1]))) / (np.amax((data[:,1]))-np.amin((data[:,1])))
data2 = (data[:,2] - np.amin((data[:,2]))) / (np.amax((data[:,2]))-np.amin((data[:,2])))

In [17]:
data_all = np.column_stack([data0, data1, data2])

In [5]:
data_all = (data - np.min(data, axis=0))/(np.max(data, axis=0) - np.min(data, axis=0))

or


In [15]:
mn, mx = data.min(0), data.max(0)
data_all = (data - mn)/(mx-mn)

In [6]:
data_all


Out[6]:
array([[ 0.        ,  0.84210526,  0.1916996 , ...,  0.45528455,
         0.97069597,  0.56134094],
       [ 0.        ,  0.57105263,  0.2055336 , ...,  0.46341463,
         0.78021978,  0.55064194],
       [ 0.        ,  0.56052632,  0.3201581 , ...,  0.44715447,
         0.6959707 ,  0.64693295],
       ..., 
       [ 1.        ,  0.58947368,  0.69960474, ...,  0.08943089,
         0.10622711,  0.39728959],
       [ 1.        ,  0.56315789,  0.36561265, ...,  0.09756098,
         0.12820513,  0.40085592],
       [ 1.        ,  0.81578947,  0.66403162, ...,  0.10569106,
         0.12087912,  0.20114123]])

write a cvs file for later usage in SC3 with scaled values


In [14]:
data_sc3 = 16+(data_all*22450)

In [17]:
np.savetxt("wine_data_scaled.csv", data_sc3, delimiter=",")

Check properties of data_all


In [13]:
data_all.shape


Out[13]:
(178, 14)

In [19]:
data_all[1,:]


Out[19]:
array([ 0.        ,  0.57105263,  0.2055336 ,  0.4171123 ,  0.03092784,
        0.32608696,  0.57586207,  0.51054852,  0.24528302,  0.27444795,
        0.26450512,  0.46341463,  0.78021978,  0.55064194])

In [20]:
data[1,:]


Out[20]:
array([  1.00000000e+00,   1.32000000e+01,   1.78000000e+00,
         2.14000000e+00,   1.12000000e+01,   1.00000000e+02,
         2.65000000e+00,   2.76000000e+00,   2.60000000e-01,
         1.28000000e+00,   4.38000000e+00,   1.05000000e+00,
         3.40000000e+00,   1.05000000e+03])

In [22]:
import time, sys
for i in range (14):
    print data[1,i]
    time.sleep(0.4)


1.0
13.2
1.78
2.14
11.2
100.0
2.65
2.76
0.26
1.28
4.38
1.05
3.4
1050.0

In [170]:
import time, sys
for i in range (14):
    print data_all[1,i]
    time.sleep(0.4)


0.0
0.571052631579
0.205533596838
0.417112299465
0.0309278350515
0.326086956522
0.575862068966
0.510548523207
0.245283018868
0.274447949527
0.264505119454
0.463414634146
0.78021978022
0.550641940086

Use oscgrain (function defined above)

read in and play/sonify one datapoint n dimensions


In [214]:
import time, sys
for i in range(14):
    oscgrain(200+(1500*data_all[123,i]))
    time.sleep(0.09)

Read-in all datapoints sequentially


In [215]:
for i in range(178):
    for ii in range (14):
        oscgrain(200+(1500*data_all[i,ii]))
        time.sleep(0.01) # 14*0.01=0.14 lenght/period time
    time.sleep(0.2)

See the notebook "EXPLORATIONS WITH ARRAY READING USING OSCGRAIN"

Next steps

is this related to parallel coordinate visualization?

combine "read in and play/sonify one datapoint n dimensions" with brushing in scatteplot matrix?

Version information


In [74]:
%load_ext version_information
%version_information numpy, scipy, matplotlib, sympy, pyosc


The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information
Out[74]:
SoftwareVersion
Python2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) [GCC 4.0.1 (Apple Inc. build 5493)]
IPython1.2.1
OSposix [darwin]
numpy1.8.1
scipy0.13.3
matplotlib1.3.1
sympy0.7.5
pyosc0.3.5b-5294
Thu Apr 17 09:54:28 2014 CEST

In [ ]: